[ray_client]: Add more retry logic#13478
Conversation
python/ray/util/client/worker.py
Outdated
There was a problem hiding this comment.
Do we need three different retry loops for the different stages? Why not have just a single one that tests this last condition?
|
Because it could fail at any one of these stages. Now, one big retry loop would look like a finite state machine of But we can't "test this last condition" alone. If this is what you'd prefer, I'll go do that. |
|
Why can't we test the last condition alone? C implies {A, B} are fine right? I think a loop that just tests C would be sufficient. If it's for better error messages, we could also check {A, B} prior to C in the loop. |
|
Btw, it would be good to have a test here |
Change-Id: I4e96d1c4dd7252b754482892597280504a4ba63c
Change-Id: I54c890e7e5d6452bcb58312d54c746e5c8e556bc
Change-Id: I23dffe65feb4c778985be724f3ab213a61eb2da8
Change-Id: I000bb467e61ca5f2f4e0345d07ca01acf143956f
Change-Id: I746c4dda014b8b5b7752c07299f9b42951fe6cb8
44c0b0e to
01b29fa
Compare
Change-Id: I9102433a212846af77ac7a23322e35b48d8464b8
Change-Id: I443e911ca6e8e6c5fcc91d2ec5a4990d81215c1e
|
Sure. Removed state machine and added test. |
|
Change-Id: I3669ea2734de9e7aad8522a898d3248aa560892a
Change-Id: Iae12027d2340e81832305a60a297f85deb8f2919
|
Tests / lint still failing |
Change-Id: Ia3e8df94ea689b38d87f8bbafb5a1142edd7b3e1
|
I'm starting to run out of ideas, but we'll see if this goes |
|
I think I figured it out |
Change-Id: Ie691ed9ac082639c583eec4f6af7578b0841c744
|
This time for sure -- I got the reproduction locally and tracked it down |
This reverts commit bc386dd.
Related issue number
Closes #13446
Checks
scripts/format.shto lint the changes in this PR.